An ultra-fast and scalable quantification pipeline for transposable elements from next generation sequencing data.
نویسندگان
چکیده
Transposable elements (TEs) are DNA sequences which are capable of moving from one location to another and represent a large proportion (45%) of the human genome. TEs have functional roles in a variety of biological phenomena such as cancer, neurodegenerative disease, and aging. Rapid development in RNA-sequencing technology has enabled us, for the first time, to study the activity of TE at the systems level.However, efficient TE analysis tools are not yet developed. In this work, we developed SalmonTE, a fast and reliable pipeline for the quantification of TEs from RNA-seq data. We benchmarked our tool against TEtranscripts, a widely used TE quantification method, and three other quantification methods using several RNA-seq datasets from Drosophila melanogaster and human cell-line. We achieved 20 times faster execution speed without compromising the accuracy. This pipeline will enable the biomedical research community to quantify and analyze TEs from large amounts of data and lead to novel TE centric discoveries.
منابع مشابه
T-lex: a program for fast and accurate assessment of transposable element presence using next-generation sequencing data
Transposable elements (TEs) are repetitive DNA sequences that are ubiquitous, extremely abundant and dynamic components of practically all genomes. Much effort has gone into annotation of TE copies in reference genomes. The sequencing cost reduction and the newly available next-generation sequencing (NGS) data from multiple strains within a species offer an unprecedented opportunity to study po...
متن کاملFastViromeExplorer: a pipeline for virus and phage identification and abundance profiling in metagenomics data
With the increase in the availability of metagenomic data generated by next generation sequencing, there is an urgent need for fast and accurate tools for identifying viruses in host-associated and environmental samples. In this paper, we developed a stand-alone pipeline called FastViromeExplorer for the detection and abundance quantification of viruses and phages in large metagenomic datasets ...
متن کاملRetroSeq: transposable element discovery from next-generation sequencing data
UNLABELLED A significant proportion of eukaryote genomes consist of transposable element (TE)-derived sequence. These elements are known to have the capacity to modulate gene function and genome evolution. We have developed RetroSeq for detecting non-reference TE insertions from Illumina paired-end whole-genome sequencing data. We evaluate RetroSeq on a human trio from the 1000 Genomes Project,...
متن کاملA Pipeline for Identifying Integration Sites of Mobile Elements in the Genome Using Next-Generation Sequencing
Next Generation Sequencing (NGS) reads obtained by sequencing of the junction of a mobile element and the host flanking region from individuals in a population are typically mapped to a reference genome to determine the location of the mobile element-host junction. We propose a clustering pipeline for grouping such NGS data into clusters corresponding to the locations of integration sites in th...
متن کاملBtrim: A fast, lightweight adapter and quality trimming program for next-generation sequencing technologies
Btrim is a fast and lightweight software to trim adapters and low quality regions in reads from ultra high-throughput next-generation sequencing machines. It also can reliably identify barcodes and assign the reads to the original samples. Based on a modified Myers's bit-vector dynamic programming algorithm, Btrim can handle indels in adapters and barcodes. It removes low quality regions and tr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره 23 شماره
صفحات -
تاریخ انتشار 2018